Clustered Data Parallelism
نویسندگان
چکیده
Many data layout optimizations cluster data accesses and memory into high-locality groups in order to optimize for the memory hierarchy. In this paper, we demonstrate that similar clustering program transformations enable efficient vectorization. We call this approach clustered data parallelism (CDP). CDP enables fast and power-efficient parallelism by partitioning a data structure into clusters such that SIMD evaluation is efficient within a cluster. We describe the CDP latent in three common computational patterns: map, reduce, and graph traversals. Demonstrating the benefits of CDP, we present case studies of instantiating the CDP patterns in order to design fast and power-efficient binary search and webpage layout algorithms. First, we increase binary search SIMD scalability by using CDP to expose speculative parallelism. Second, we achieve the first SIMD webpage layout algorithm by using CDP to eliminate heavy branching. We report strong performance improvements. Targeting AVX, we see a 5.5X speedup and 6.9X performance/Watt increase over FAST, the previously fastest SIMD binary search algorithm. Running webpage layout with SSE4.2 instructions, we observe a 3.5X speedup and 3.6X performance/Watt increase over an already optimized baseline. [Copyright notice will appear here once ’preprint’ option is removed.] Clustered data parallelism 1 2012/2/1
منابع مشابه
Stream Execution on Embedded Wide-Issue Clustered VLIW Architectures
Very long instruction word(VLIW-) based processors have become widely adopted as a basic building block in modern Systemon-Chip designs. Advances in clustered VLIW architectures have extended the scalability of the VLIW architecture paradigm to a large number of functional units and very-wide-issue widths. A central challenge with wide-issue clustered VLIW architecture is the availability of pr...
متن کاملCode Optimization of Polynomial Approximation Functions on Clustered Instruction-level Parallelism Processors
In this paper, we propose a general code optimization method for implementing polynomial approximation functions on clustered instruction-level parallelism (ILP) processors. In the proposed method, we first introduce the parallel algorithm with minimized data dependency. We then schedule and map the data dependency graph (DDG) constructed based on the parallel algorithm to appropriate clusters ...
متن کاملExtending Task Parallelism For Frequent Pattern Mining
Algorithms for frequent pattern mining, a popular informatics application, have unique requirements that are not met by any of the existing parallel tools. In particular, such applications operate on extremely large data sets and have irregular memory access patterns. For efficient parallelization of such applications, it is necessary to support dynamic load balancing along with scheduling mech...
متن کاملDynamically Matching ILP Characteristics Via a Heterogeneous Clustered Microarchitecture
Applications vary in the degree of instruction level parallelism (ILP) available to be exploited by a superscalar processor. The ILP can also vary significantly within an application. On one end of the microarchitecture space are monolithic superscalar designs that exploit parallelism within an application. At another end of the spectrum are clustered architectures having many simple cores that...
متن کاملCombining Task- and Data Parallelism to Speed up Protein Folding on a Desktop Grid Platform Is efficient protein folding possible with CHARMM on the United Devices MetaProcessor?
The steady increase of computing power at lower and lower cost enables molecular dynamics simulations to investigate the process of protein folding with an explicit treatment of water molecules. Such simulations are typically done with well known computational chemistry codes like CHARMM. Desktop grids such as the United Devices MetaProcessor are highly attractive platforms, since scavenging fo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012